[Feat][EPLB][Perf] Enable Round-robin expert placement strategy while eplb is enabled. #25798

cboss6 · 2025-09-27T04:31:50Z

Description:
PR-23745 introduced the round-robin expert placement strategy for MoE models with multiple expert groups, providing a simple yet effective way to distribute experts evenly across devices.
This PR extends that work by ensuring full compatibility with EPLB (Expert Parallel Load Balancing). With this enhancement, round-robin placement can now be seamlessly combined with dynamic expert load balancing, enabling more flexible expert scheduling while maintaining balanced utilization and performance.

Performance

Conclusion: With configurations list below, when eplb is enabled, the round-robin strategy improves avg. throughput and end-to-end latency by approximately 3% than default linear strategy.

Test Platform:
Vllm version: vllm/vllm-openai:nightly-8c546102658f97b10d13bcf25193b65edc6ea6ff
Model: DeepSeek-V2-Chat-0628,
GPU: H20 * 8
Serving mode config :
python3 -u -m vllm.entrypoints.openai.api_server
--model ${MODEL_PATH}
--trust-remote-code
--gpu-memory-utilization 0.85
-tp 8 \
--enable-expert-parallel
--enable-eplb
--expert-placement-strategy "round_robin"

Benchmark config: input_len=1024, output_len=128, request_rate=4, max_concurrency=4, num_prompts=32:
python3 ./bench_serving.py
--backend vllm
--dataset-name random
--model ${MODEL_PATH}
--random-input-len 1024
--random-output-len 128
--random-range-ratio 0.5
--tokenizer ./tokenizer
--dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json
--request-rate 4
--max-concurrency 4
--num-prompts 32
--base-url http://127.0.0.1:8000
--port 8000

Accuracy Test

Tested with Deepseek-v2-chat-0628 on h20*8 with following serving cmd:

python3 -u -m vllm.entrypoints.openai.api_server \
            --model ${model_path} \
            --trust-remote-code \
            --gpu-memory-utilization 0.85 \
            -tp 8 \ 
            --enable-expert-parallel \
            --enable-eplb \
            --expert-placement-strategy "round_robin" \

Note: Deepseek-v2 has a bad behavior on our chosen dataset, just to make sure this PR has no impact on accuracy.

Dataset	vllm v0.10.1.1	This PR
Aime24	13.33%	20.00%
Gpqa	41.91%	44.44%
Math500	72.20%	72.40%

```

Signed-off-by: bruceszchen <[email protected]>

gemini-code-assist

Code Review

This pull request enables the round-robin expert placement strategy for MoE models with EPLB enabled. The changes involve refactoring the expert placement strategy logic into a utility function and updating the EPLB state creation to support the round-robin strategy. The refactoring improves code organization. However, I've found a critical bug in the implementation of the round-robin placement logic that occurs when the number of experts is not divisible by the number of expert parallel ranks. This can lead to incorrect model behavior. A fix is suggested to ensure correctness.

vllm/distributed/eplb/eplb_state.py

Signed-off-by: bruceszchen <[email protected]>

abmfy · 2025-09-30T03:44:10Z

Thanks for the contribution!
@tlrmchlsmth Do you think we should consider using this strategy as the default when EPLB is enabled? My concern is that since this placement only affects the stage before the first EPLB rearrangement, it won’t actually bring any improvement. I’d prefer to keep the code simple.

cboss6 · 2025-09-30T06:27:54Z

I don’t agree with the view that round-robin offers no improvement. Since round-robin mainly serves as a better initialization step before being adjusted by EPLB, my performance tests focused on the early stage (after at least one EPLB adjustment), where the average throughput and E2E improvement was around 2.5–3%.

If we only consider the initial state, as far as I know, round placement is generally better than linear placement in multi–expert-group models without redundant experts. Furthermore, since there’s no risk with this PR when EPLB is enabled, I believe it’s reasonable to make round the default initial state instead of linear.

Enable round_robin expert placement strategy with eplb enabled.

d5fadd2

Signed-off-by: bruceszchen <[email protected]>

cboss6 requested a review from mgoin as a code owner September 27, 2025 04:31

gemini-code-assist bot reviewed Sep 27, 2025

View reviewed changes

vllm/distributed/eplb/eplb_state.py Outdated Show resolved Hide resolved

cboss6 changed the title ~~[Feat][EPLB] Enable Round-robin expert placement strategy with eplb enabled.~~ [Feat][EPLB][Perf] Enable Round-robin expert placement strategy with eplb enabled. Sep 27, 2025

cboss6 changed the title ~~[Feat][EPLB][Perf] Enable Round-robin expert placement strategy with eplb enabled.~~ [Feat][EPLB][Perf] Enable Round-robin expert placement strategy while eplb is enabled. Sep 27, 2025

cboss6 added 2 commits September 27, 2025 13:03

Fix a potential issue.

b0aa899

Signed-off-by: bruceszchen <[email protected]>

Fix a problem which makes expert-placement config inactive.

3b77117

Signed-off-by: bruceszchen <[email protected]>

cboss6 requested a review from 22quinn as a code owner September 28, 2025 07:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feat][EPLB][Perf] Enable Round-robin expert placement strategy while eplb is enabled. #25798

[Feat][EPLB][Perf] Enable Round-robin expert placement strategy while eplb is enabled. #25798

cboss6 commented Sep 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

abmfy commented Sep 30, 2025

Uh oh!

cboss6 commented Sep 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

[Feat][EPLB][Perf] Enable Round-robin expert placement strategy while eplb is enabled. #25798

Are you sure you want to change the base?

[Feat][EPLB][Perf] Enable Round-robin expert placement strategy while eplb is enabled. #25798

Conversation

cboss6 commented Sep 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance

Accuracy Test

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

abmfy commented Sep 30, 2025

Uh oh!

cboss6 commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cboss6 commented Sep 27, 2025 •

edited by github-actions bot

Loading

cboss6 commented Sep 30, 2025 •

edited

Loading